-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: fix parsing of ODF time values with comments #55324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
38ef2df to
2bb23e6
Compare
pandas/io/excel/_odfreader.py
Outdated
| raise ValueError(f"Failed to parse ODF time value: {value}") | ||
| h, m, s = parts.group(1, 2, 3) | ||
| # ignore date part from some representations as both pd.Timestamp | ||
| # and datetime.time restrict hour values to 0..23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ODS supports timedelta ([hh]:mm:ss). times_1904.ods is broken, see #55045. Maybe fix the file and do something like?
if h > 23:
return pd.Timedelta(...)
else:
return cast(Scalar, datetime.time(...))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, timedelta seem to make more sense than timestamp - at least for ODF time-value, because durations can be larger than 24h and they can be negative, see duration.ods:

It's my first PR here, so I thought I'd better keep it tight and clean - hoping we could leave timedelta for a follow-up discussion and new PR. For sure unit tests would need adjustments (they specifically require datetime.time timestamps) and I don't know how other spreadsheet formats correspond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in saying that the current behavior of pandas will read the top value in the screenshot above as having 50 hours, where as this change will now be 50 - 48 = 2 hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach Which commit/version of pandas yields 50 hours for you ?
I thought you'd get an error in _odfreader.py:217 similar to:
>>> pd.Timestamp('50:15:00')
Traceback (most recent call last):
File "parsing.pyx", line 681, in pandas._libs.tslibs.parsing.dateutil_parse
ValueError: hour must be in 0..23There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which commit/version of pandas yields 50 hours for you ?
I haven't run anything.
I thought you'd get an error in _odfreader.py:217 similar to:
I think you're saying that both main and this PR will raise on duration.ods, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I only said that the current implementation does not support time value equal or larger than 24 hours in ODF files.
Could you please run the files you'd like to check and share any results that raise your concerns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only said that the current implementation
Does current implementation mean main or this PR?
2bb23e6 to
7106b2b
Compare
7106b2b to
6ebe03a
Compare
6b896e3 to
a493bb7
Compare
|
On a493bb7, I've refactored the helper function into |
8a32292 to
7c22815
Compare
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
7c22815 to
ea23a2e
Compare
|
@rhshadrach @mroeschke @dimastbk Would there be anything else I should do on this PR ? |
mroeschke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me
Just so you're aware, by force pushing you make it so that reviewers can no longer use the "Show changes since your last review" feature. Not a big deal at all here because the diff is so small. |
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions - see the above review comments.
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
|
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
|
@rhshadrach @mroeschke @dimastbk Thank you for reviewing this PR and for all your comments. I'm sorry it didn't work out: |
time-valuecellstest_1900.odsandtest_1904.odsfixtures, so thatio/excel/test_readers.py:test_reader_seconds()would be failing without this fix. Also fixed missing microseconds there (see p.1 in BUG (test): bad file for testing ODFReader #55045).doc/source/whatsnew/v2.2.0.rstfile.Related to: #55045 (test files updates)